Acoustic Model Compression with MAP adaptation
نویسندگان
چکیده
Speaker adaptation is an important step in optimization and personalization of the performance of automatic speech recognition (ASR) for individual users. While many applications target in rapid adaptation by various global transformations, slower adaptation to obtain a higher level of personalization would be useful for many active ASR users, especially for those whose speech is not recognized well. This paper studies the outcome of combinations of maximum a posterior (MAP) adaptation and compression of Gaussian mixture models. An important result that has not received much previous attention is how MAP adaptation can be utilized to radically decrease the size of the models as they get tuned to a particular speaker. This is particularly relevant for small personal devices which should provide accurate recognition in real-time despite a low memory, computation, and electricity consumption. With our method we are able to decrease the model complexity with MAP adaptation while increasing the accuracy.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملMultifactor adaptation for Mandarin broadcast news and conversation speech recognition
We explore the integration of multiple factors such as genre and speaker gender for acoustic model adaptation tasks to improve Mandarin ASR system performance on broadcast news and broadcast conversation audio. We investigate the use of multifactor clustering of acoustic model training data and the application of MPE-MAP and fMPE-MAP acoustic model adaptations. We found that by effectively comb...
متن کاملSubtitle Phoneme Class Based Adaptation for Mismatch Acoustic Modeling of Distant Noisy Speech ( Preprint
A new adaptation strategy for distant noisy speech is created by phoneme class based approaches for context-independent acoustic models. Unlike the previous approaches such as MLLR-MAP adaptation which adapts acoustic model to the features, our phoneme-class based adaptation (PCBA) adapts the distant data features to our acoustic model which has trained on close microphone TIMIT sentences. The ...
متن کاملGaussian Map based Acoustic Model Adaptation Using Untranscribed Data for Speech Recognition in Severely Adverse Environments
This study proposes an acoustic model adaptation scheme to improve speech recognition in severely adverse environments utilizing untranscribed data. In the proposed method, a clean GMM is estimated from clean training data, and a noisecorrupted GMM is obtained by MAP adaptation over the adaptation data. The Gaussian component of the adapted HMMs is obtained using the transform of the most simil...
متن کامل